Error probability analysis in quantum tomography: a tool for evaluating experiments 



O 

(N 

O 

Q 



a,: 



> 
(N 

ON 

o 
o 



Takanori Sugiyama^'E] Peter S. Turner, 1, and Mio Murao 1,2, [! 

1 Department of Physics, Graduate School of Science, The University of Tokyo, 

7-3-1 Kongo, Bunkyo-ku, Tokyo, Japan 113-0033. 

2 Institute for Nano Quantum Information Electronics, The University of Tokyo, 

4-6-1 Komaba, Meguro-ku, Tokyo , Japan 153-8505. 
(Dated: December 7, 2010) 

We expand the scope of the statistical notion of error probability, i.e., how often large deviations 
are observed in an experiment, in order to make it directly applicable to quantum tomography. We 
verify that the error probability can decrease at most exponentially in the number of trials, derive 
the explicit rate that bounds this decrease, and show that a maximum likelihood estimator achieves 
this bound. We also show that the statistical notion of identifiability coincides with the tomographic 
notion of informational completeness. Our result implies that two quantum tomographic apparatuses 
that have the same risk function, (e.g. variance), can have different error probability, and we give an 
example in one qubit state tomography. Thus by combining these two approaches we can evaluate, 
in a reconstruction independent way, the performance of such experiments more discerningly. 
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I. INTRODUCTION 

Many applications that make use of "quantumness" in 
order to outperform their classical counterparts have re- 
cently been proposed, especially in the field of quantum 
information. One of the main reasons for this increase has 
been the dramatic development of experimental technolo- 
gies, and many of the proposals have already given rise 
to experimentally realizable applications [![. To confirm 
whether or not an apparatus constructed for an applica- 
tion works well, we need to compare its performance to a 
theoretical model. The standard method used for a thor- 
ough such comparison is called quantum tomography Q . 
This paper is concerned mainly with the question of how 
to evaluate measurement apparatuses used in quantum 
tomography. 

The theory of quantum tomography consists of ex- 
perimental design methods and reconstruction schemes. 
Known parts of the experimental apparatus in a quantum 
tomographic experiment, (or at least those parts assumed 
to be known), are together called the tester. Experimen- 
tal design methods are concerned with how good (or bad) 
the tester is for estimating the mathematical representa- 
tion of the tomographic object (e.g. a quantum state, or 
a process). Usually the goodness of the tester is evaluated 
by the error of the estimation result from its experimen- 
tal data set. In real experiments, we cannot perform an 
infinite number trials - we need to estimate the true to- 
mographic object from a finite number. This estimation 
procedure is called an estimator in statistical estimation 
theory and a reconstruction scheme in quantum tomog- 
raphy. The error of the estimation result depends upon 
the reconstruction scheme, and when evaluating a tester's 
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performance, we usually focus upon the error in the case 
where the best reconstruction scheme is used. 

Evaluating estimation errors on the reconstructed ob- 
ject is a problem of statistical estimation theory. There 
are two main approaches; one is to use a risk function 
and the other is to use error probability. We measure the 
difference between the true object and the estimate by 
a loss function. A risk function is the average value of 
the loss function. As the number of independent, iden- 
tically distributed (iid) trials increases, it is known that 
the error, given by the risk function, of any unbiased es- 
timator can decrease by at most the Cramer-Rao bound, 
and a maximum likelihood estimator achieves the bound 
asymptotically [jj. The application of the Cramer- Rao 
inequality to quantum tomography is studied in [343 ■ 
On the other hand, an error probability is the probabil- 
ity that large deviations of the loss function are observed. 
It has been shown that the error probability can decrease 
at most exponentially Q, and under some conditions, the 
bound is achieved (asymptotically) by a maximum like- 
lihood estimator [9|]. However, the explicit form of this 
bound has not been shown except for the case where the 
estimation setting can be reduced to one parameter esti- 
mation or the loss function is a Euclidean norm 
(in [l3| the applicability of the proof used for a Euclidean 
norm to more general loss functions is discussed) . In gen- 
eral, the estimated object has multiple parameters, and 
the choice of the loss function depends upon the pur- 
pose of the experiment. A mean squared error can be 
unsuitable for some situations, especially those arising 
in quantum tomography. In order to be more useful in 
practice, the explicit form of the bound is needed in more 
generality. 

In this paper, by using Sanov's theorem flil [l5j from 
large deviation theory, we derive the error probability in- 
equality bounding general loss functions on a finite mul- 
tiparameter space. We prove that a maximum likelihood 
estimator achieves the equality under some conditions 
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- that are satisfied in quantum tomography - and give 
the explicit form of the lower bound. Our result indi- 
cates that two testers with same value of their risk func- 
tions can be different from an error probability viewpoint, 
which allows for more discerning comparisons of testers 
in quantum tomography. We also show that the required 
conditions for our inequality hold not only for tomogra- 
phy of quantum states, but also for that of quantum in- 
struments [l6[ , which includes process and measurement 
tomography as specific cases. 

In section [TTl we overview the theory of quantum to- 
mography using state tomography as an example. In 
section IIII1 we review classical statistical estimation the- 
ory, introducing the necessary aspects of error probabil- 
ity theory. In section ITVl we give the main theorem and 
some analysis which includes an example - the proof of 
the theorem is given in the Appendix. In section [Vj we 
discuss some open problems, and conclude with a sum- 
mary in section IVT1 



II. OVERVIEW OF QUANTUM TOMOGRAPHY 

Quantum tomography is classified by the tomo grap hic 
object to be reconstructed: state tomography |17H19l | 
treats density operators, which describe states of quan- 
tum systems; process tomography treats linear, 
trace-preserving, and completely positive maps, which 
describe deterministic state transitions; POVM tomog- 
raphy [2|| H(| treats positive operator-valued measures 
(POVMs), which are sets of positivc-scmidcfinitc oper- 
ators describing the probability distributions obtained 
by measurements; instrument tomography [lrij treats 
quantum instruments, which are sets of linear, trace- 
decreasing, completely positive maps describing both 
probability distributions and state transitions caused by 
measurements. Here we briefly overview the theory of 
quantum state tomography, for simplicity. 

The purpose of quantum state tomography is to iden- 
tify the density operator characterizing the state of the 
quantum system of interest. Let "H and S(l-L) denote the 
Hilbert space corresponding to the system and the set of 
all density operators on H, respectively. We assume that 
the dimension d = dimH is finite. A density operator 
p on H can be linearly and bijectively parametrized by 

d 2 — 1 independent real variables [13, HU, i.e., p = p(s), 
,2 1 

where s is in R . Let us define the set of all param- 
eters S := {s G R^ 1 ! p(s) G S(H)}. Identifying the 
true density operator p G S(H) is equivalent to identify- 
ing the true parameter s G S. Let II = {ILxj^gsi denote 
the POVM characterizing the tester used in the tomo- 
graphic experiment [29[ where il :— {1, . . . , M}. When 
the true density operator is /5(s), the probability distri- 
bution p a describing the tomographic experiment is given 
by 

p a (x) = Tr[p(s)Il x ], xeil, (1) 
where Tr denotes the trace operation with respect to 7-L. 



(Note that in subsection IIV CI a different trace operation, 
tr, is introduced.) Suppose that we perform TV measure- 
ment trials and obtain a data set x N = (xi, . . . , xjv), 
where x% G fi is the outcome observed in the i-th trial. 
Let N x denote the number of times that outcome x oc- 
curs in x N , then Jn(x) :— N x /N is the relative frequency 
of x for the data set x . In the limit of A = 00, 
the relative frequency is equal to the true probability 
p s (x). A tester is called informationally complete if 
Tr[/5(s)n x )] = Tr [/oftjhas a unique solution p for ar- 
bitrary p(s) G S(H) [301. This condition is equivalent 
to that of the tester POVM II being a basis for the set 
of all Hcrmitian matrices on H. For finite N, the rela- 
tive frequency and true probability are generally not the 
same, i.e., there is unavoidable statistical error, and we 
need to choose an estimation procedure that takes the 
experimental result x N to a density operator, i.e., a re- 
construction scheme. 

Reconstruction schemes are concerned with how best 
to derive the mathematical representation of the tomo- 
graphic object from the obtained experimental data, and 
are called estimators in statistical estimation theory, 
where the analysis of the estimation precision (or esti- 
mation error) is very important. In actual experiments, 
there are two sources of imprecision: statistical errors and 
systematic errors. As mentioned above, statistical error 
is caused by the finiteness of the total number of measure- 
ment trials, and is unavoidable in principle. Systematic 
error is caused by our lack of knowledge about the tester, 
that is, the difference between the true tester and what 
we believe to be the true tester. Usually, the effect of the 
systematic error is approximated by introducing a model, 
and is assumed to be known. Therefore, the analysis of 
the estimation error is usually reduced to that of the sta- 
tistical error. To date at least five reconstruction schemes 
have been proposed, namely linear Il7, 120I l2ll [25| . 
maximum likelihood mHaH, Bayesian )2Mm, max- 
imum entropy 34 1, and norm minimization [351 ] . The 
effect of statistical errors on the reconstructed object de- 
pends upon the scheme used, hence the main problem is 
how to quantify the effect of the statistical error on the 
reconstructed object, and how to do so as rigorously as 
possible. 

It is natural to consider a linear reconstruction scheme, 
which demands that we find a d x d matrix p l satisfying 



Trlp'lLj = f N (x), x G n. 



(2) 



However, Eq.([2]) does not always have a solution, and 
even when it does, although the solution is Hcrmitian 
and normalized, it is not guaranteed that p l is posi- 
tive semidefinite. A maximum likelihood reconstruction 
scheme addresses these problems. The estimated matrix 



al is defined as 
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It can be shown that when p l G S{H), p l 



Ami 



(3) 



holds. 
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We will concern ourselves with maximum likelihood re- 
constructions here, as we will see that they are optimal 
in the sense that they can saturate the bounds we are 
considering. 



III. OVERVIEW OF CLASSICAL STATISTICAL 
ESTIMATION 

In this section, we introduce the notation and termi- 
nology of classical statistical estimation theory that we 
use to arrive at our main results. We also review the 
necessary aspects of error probability. For the reader fa- 
miliar with quantum Fisher matrices, we justify our use 
of classical estimation theory, or the classical Fisher ma- 
trix, in subsection IV Dl 

Let (Q,<5?, P) be a probability space, denoting a sam- 
ple space, a Borel algebra of subsets of the sample space, 
and a measure that assigns probabilities to those sub- 
sets, respectively (the Borel structure simply assures that 
combinations of subsets of events get assigned probabili- 
ties in a sensible fashion, e.g. P(Y U Z) = P(Y) + P(Z) 
for disjoint Y and Z). Define the TV-fold direct product 
fl N := x • • • X as the space of sequences of events. 
Let x N = {x\,--- ,xn}, Xi £ H, be a sequence of iid 
observations of the sample space ft. We assume that the 
sample space is finite (see subsection IV Bl for a discussion 
of infinite spaces). Suppose that the probability space 
admits a statistical model V% = {Pg\9 £ 0} that as- 
signs a valid probability measure to each parameter 9 
in which is an subset of the fc-dimensional Euclidean 
space R fe , the closure O is compact, and the interior 0° 
is open. The quantum state parameter space S from the 
last section is an example of such a 0, where the sta- 
tistical model is given by Eq.([T]). We assume that each 
measure Pe has a probability distribution {pg(x)} xe a sat- 
isfying P$(Y) = J2xeYP s ( x ) wnere Y £ S3. A probabil- 
ity measure Pe and the probability distribution pg have 
a one-to-one correspondence for any 6 £ 6, and we do 
not distinguish between Vq and {pg;9 G 0}. Let V(Q) 
denote the set of all probability distributions with the 
sample space f2, then Ve C V(Q). Let Pg N ^ denote the 
TV-fold product probability measure Pg x • • • x Pg. 

Let g denote a map from the parameter space to 
a metric space T. An estimator of g(9) is a set of maps 
tp = {<fi, if2, ■ ■ •}, (one for each number of trials N), from 
observation results x N to T. Each ippf(x N ) is called the 
estimate of x N . A maximum likelihood estimator 9 ml = 
{9?\ 9f\ . . .} of 9 is defined as 



ef{x ! 



argsup eee 



PW({^}). 



(4) 



A map D from r x T to R is called a loss function on T 
when D satisfies the following two conditions: (i) Va, b £ 
r, D(a,b) > 0, (ii) Va G T, D(a,a) = 0. We introduce 
three additional conditions: (iii) Va,6 <G T, D(a,b) = 
D(b, a), (iv) Va, b, c £ T, D(a, b) < D(a, c) + D(c, b), (v) 
Va, b £ T, D(a, b) = => a = b. A loss function satisfy- 
ing conditions (iii) and (iv) is called a semi-distance, and 



a semi-distance satisfying condition (v) is called a dis- 
tance. For example, let us define a function g from to R 
as g(6) = \\0\\ , 9 £ 0, where || • || is the Euclidean norm on 
R k . Then \g(6)-g(6')\ is a semi-distance on (0, 6' £ 0) 
and \a — b\ is a distance on R (a, 6 £ R). In general, a 
loss function is not necessarily a distance. A loss func- 
tion satisfying condition (v) is called a pseudo-distance 
[3^ . The Kullback-Leibler divergence introduced below 
is an example of pseudo-distance that is not also a dis- 
tance. If a loss function D on R fe is sufficiently smooth 
and it can be approximated by the Hesse matrix H a up 
to second order, then H a is positive semidefinite for all 
a £ R k from condition (i), and if the loss function D 
is a pseudo-distance, then H a is positive definite for all 
a G R fc . 

There are at least two methods to evaluate an estima- 
tion error by using loss function. One is a method using 
risk functions. An iV-trial risk function D^ N ^ is defined 
as the expectation value of the loss function between an 
estimate and the true object, given by 9; 



D 



(N) ._ jp(N) 



= E^>{D( l p N (x"),g(9))} 



(5) 



where E { e N) [f(x N )} = £." 6 n* Pe(x N )f(x N ) is the ex- 
pectation value of a function / on fl N . When V = R', 

for any unbiased estimator (ip satisfying Eg N ^ [<Pn(x n )] = 
g(9) for any N and 9 £ 0), the Cramer- Rao inequality 

E ( g N) [{ VN {x N ) g(9)){ VN (x N ) g(9)) T ] 

~ Nd9 9 89 W 

holds under some regularity conditions, where (§§) Q/3 := 

^p- (a = 1, . . . , k; P = 1, . . . , I) is the Jacobian and Fg 1 
is the Moore-Penrose generalized inverse of the Fisher 
matrix 

F e ■= J2xenPe{ x )Ne^ogpg(x)][Vg\ogpg(x)} T . Asymp- 
totically a maximum likelihood estimator achieves the 
equality under some conditions Q. 

The other is a method using error probabilities. We 
call 



P e W(0): = P { g N \D^ N {x N ),g{9))>e) 



(7) 



=P e (Ar) ({^ £0";%^),^) >£})(8) 

an error probability with a threshold e > 0. An estimator 
if is called (weakly) consistent in the loss function D if 



P £ (Ar) (6>) -> as N -> 00 



(9) 



holds for any e > 0. The conditions under which a maxi- 
mum likelihood estimator is consistent includes the iden- 
tifiability condition FItI on a statistical model Vq: for any 
9 £ 0° and 9' £ 0, if 9 ^ 9', then there exists at least 
one outcome x £ CI satisfying (a;) 7^ pg>(x) [HI [Hi. Let 
us define 



R e (9) := mi {K(pg l \\p e ) ] D(g(9'),g(9)) > e}, 

6r £fc> 



(10) 
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where K{q\\p) = Ea^n lo S is called the 
Kullback-Leibler divergence (also known as the relative 
entropy). When g is injective and D is a distance, for 
any weakly consistent estimator in D, 

lim ±-\ogP( N \6)>-R e (6) (11) 

holds @. It is known that in general the lower bound 
of Eq. (fTTj) is not attainable by any estimate [HI . If we 
consider the limit e — > 0, under some conditions (includ- 
ing the idcntifiability condition), a maximum likelihood 
estimator achieves the equality, that is, 

lim lim J- log P e W(0) = -r{6), (12) 

e— >0 N—KX) e J\ 

where u is a real number suitable for D and r(6*) := 
lim e _>.o The explicit forms of the rate are known 

for two specific cases. The first is the case where T = R 
and D is the absolute value. In this case, the order u is 
2 and the explicit form of the lower bound is known to 
be HEEI 



2Veg(0)-F e 1 V eg {ey 

The second is the case where T = R fc , D is the Euclidean 
distance on R fc , and the order u is again 2; the explicit 
form is [lU 

r(6) = - inf a ■ F g a. (14) 

2 aeR fc ;||a|| = l 

For more general r or D, however, the explicit form of 
the lower bound is not known. Quantum state tomog- 
raphy corresponds to the case where T = R d " _1 , and 
the standard loss function is the square of the fidelity 
distance D F (p,p') := 1 — Tr[y/ y/pp'^/p] 2 or the square 
of the trace distance Dt(p,p') := Tr[|p — f>'\] 2 - In this 
paper, we extend the above results to multiparameter 
spaces and more general loss functions such as these that 
arc directly applicable to quantum tomography, and give 
the explicit form of the lower bound. We apply our result 
to one qubit state tomography and show that it makes 
it possible to evaluate the performance of an experimen- 
tal apparatus in greater detail. We also give quantum 
tomography conditions equivalent to the identifiability 
condition in classical estimation theory. 

IV. MAIN RESULT AND ANALYSIS 

A. Main theorem 

For simplicity we consider quantum state tomography. 
Suppose that we use a loss function D on S(Ji). Let us 
define a loss function A on S as A(s, s') := D(p(s), p(s')) 
Vs, s' e S. Assume that A is sufficiently smooth. Let 
5° denote the interior of S. We define a same point 



Hesse matrix H s for a two variable function / on S x S 
as V s ,V s ,/(s',s)| s / =s = [d s ,<,d s ,pf(s',s)\ B ,= a ]. In the 
following theorem, we assume the second order approxi- 
matability on the loss function. We choose the order of 
the error probability threshold e to be two, in agreement 
with the Euclidean distance. 

Theorem 1 Suppose that A is a pseudo- distance on S 
with a non-zero same point Hesse matrix H s . If s € S° , 
for an arbitrary consistent estimator s cst , the following 
inequality holds: 

lim lim J-logPj w >(A«, S )>£ 2 ) 

^-l/o-^^HlF- 1 ^), (15) 

where o~\ (A) is the maximal eigenvalue of an Hermitian 
matrix A. Furthermore, when the tester is information- 
ally complete, a maximum likelihood estimator s ml is con- 
sistent and achieves the equality in Eg. H5\) ,i.e., 

lim lim -L log PW (A«, s) > e 2 ) 

e— J-0 N— too 6 iv 

= -l/a 1 (^H~ s F- 1 ^/H~ s ) (16) 

holds. 

The detailed proof of Theorem Q] appears in the Ap- 
pendix - here we give an outline. The proof is divided 
into six parts. For parts one through five, we do not 
assume that the probability distributions are quantum 
mechanical; we only assume that they are sufficiently 
differentiable and that the parameter space is compact. 
Only in the sixth part does quantum mechanics arise. In 
Lemma 1, by using the same logic as the proof of Eq.(TlT|) 
in Q , we show that Eq. (fTT]) holds for any estimator con- 
sistent not only in distances but also in pseudo-distances. 
Lemma 2 is introduced in order to calculate the infimum 
in Eq. ([T0| directly. We use this in Lemma 3, where we 
obtain the explicit form of the bound on the rate, and ob- 
tain Eq.([T5|). Next we introduce Sanov's theorem, a large 
deviation theorem that, roughly speaking, gives the rate 
of the probability of observing a relative frequency that 
differs from the true probability distribution. Lemma 4 
uses the compactness of the parameter space and Sanov's 
theorem to prove that the error probability of a maxi- 
mum likelihood estimator decreases exponentially if the 
identifiability condition is satisfied. Then, the maximum 
likelihood estimator is consistent and satisfies Eq. ljPo)) . In 
Lemma 5, we calculate the rate of decrease of the maxi- 
mum likelihood estimator directly by using Sanov's theo- 
rem and Lemma 3, and show that the rate coincides with 
the lower bound in Eq. tfTB"]) . Hence, we obtain Eq. (|T6|) . 
subject to the identifiability condition. Finally, we prove 
that in quantum state tomography the identifiability con- 
dition is equivalent to the informational completeness of 
the tester, which we present as Lemma 6. Together these 
lemmas prove Theorem [TJ 

Note that in the proof we assume the compactness of 
the parameter space (in Lemmas 1 to 5) and the linear 
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paramctrizability of probability distributions (in Lemma 
6). These assumptions hold for any quantum operator. 
Also, the concept of idcntifiability applies to the tomo- 
graphic completeness of states equally well as it does to 
the informational completeness of measurements, which 
can be shown using the same logic as that of Lemma 6. 
Thus theorem [T] holds for all types of quantum tomogra- 
phy. The dimension of the parameter space k depends 
upon the type of quantum tomography: k — d 2 — 1 and 
d 4 — d 2 for state and process tomography, respectively. 
For POVM and instrument tomography, k = (M — \)d 2 
and Md A — d 2 respectively, where M denotes the number 
of measurement outcomes. 



B. Meaning of the lower bound 

Theorem [1] indicates that in quantum tomography, if 
we have a sufficiently large data set, the error probability 
of any consistent estimator with a small threshold can de- 
crease at most exponentially, and the rate is bounded by 
an estimator-independent function 1 jo\ {\J FL S F~ 1 \J H s ) . 
Also, the bound is achievable by a maximum likelihood 
estimator. Therefore, from the error probability view- 
point, if we can perform a large number of measurement 
trials, a maximum likelihood reconstruction scheme is op- 
timal. We can evaluate the performance of a given tester 
by the size of the maximal eigenvalue of the matrix 



cralizcd Cramer- Rao inequality can be derived, i.e., for 
any unbiased estimator, the following inequality holds: 



G a 



(17) 



Testers with smaller maximal eigenvalues are better. 
The inverse Fisher matrix F~ l alone characterizes the 
parameter-identifiability of the tester with respect to the 
Euclidean distance because the Hesse matix of the square 
of the Euclidean distance A E (s,s') := \\s - s'\\ 2 is 21, 
and we obtain 

(18) 



<Tl(G a ) 



2a 1 (F 3 ~ 1 ) 
— inf a 

2 aeR fr :||o||=l 



(19) 
(20) 



where <Jk(A) is the minimal eigenvalue of an Hermitian 
matrix A. This result coincides with the known result of 
Eq. (fLT| . The loss function A characterizes the purpose 
of the estimation (what we want to know) , and the same 
point Hesse matrix H s modifies the inverse Fisher matrix 
from the Euclidean distance to the loss function A on 
S. Therefore the matrix G s characterizes the parameter- 
identifiability of the tester with a modification according 
to our estimation purpose. 



C. Relation to risk functions 

If we assume sufficient smoothness of a loss function A 
on S and informational completeness on the tester, a gen- 



AW > 



2N 



4)' 



(21) 



where tr denotes the trace operation with respect to the 
parameter space Ea.(|2ip indicates that for suffi- 

ciently large N , the risk function can decrease at most 
inverse-proportionally to N, and the rate is characterized 
by trfi/sF" 1 ]. We can rewrite this as 

tr^F- 1 ] - trlVmF- 1 ^ (22) 
= E*=iMG s ), (23) 

where a a (A) is the a-th eigenvalue of a symmetric k x k 
matrix A arranged in decreasing order. Therefore, the 
rates of decrease of error probability and risk function are 
both characterized by, respectively, the maximal eigen- 
value and the sum of all the eigenvalues of a common 
matrix G s . The rates' properties depend upon the choice 
of the loss function. For example, when we choose the 
Kullback-Leibler divergence, i.e., A(s,s') = K(p s \\p a '), 
we obtain H s = F s and therefore <ji(G s ) = 1 and 
S a =i a a{G s ) = k. In this case the rates of decrease do 
not depend upon the true parameter or the tester, but in 
general the rates depend upon both. 

The Cramer- Rao inequality holds only for unbiased es- 
timators, and the bound can be broken by biased esti- 
mators. On the other hand, the error probability in- 
equality holds for any consistent estimator. A maximum 
likelihood estimator is consistent under some conditions 
(including the idcntifiability condition), and is not unbi- 
ased in general but achieves the lower bound of Eq. (p?Tj) 
asymptotically. When we use a maximum likelihood re- 
construction scheme in quantum tomography, the perfor- 
mance of the tester is evaluated by Y^ a =i a a{G s ) from 
the risk function viewpoint. When we have two testers 
with the same value of $3 a =i a a{G s ) at a s e S, their 
performances are equivalent in the risk function sense, 
but if the maximal eigenvalues a\{G a ) are different, their 
error probability performances are different. Thus we can 
evaluate the performance of testers more discerningly by 
considering error probabilities than we can by consider- 
ing only risk functions, using the same set of eigenvalues 
- that of the matrix G a . 



D. Example 

Here we analyze a simple example of a tester in 1-qubit 
state tomography; a 6-state POVM 

n = {i|t*Xt* llltvXtv I 

l\ly){U\,l\W{tzll\l*)(lz\}- (24) 

This is constructed by mixing the x-, y-, and z-projective 
measurements randomly, as in FigJTJ This example will 
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serve to illustrate how the performances of risk function 
and error probability approaches can differ; see the dis- 
cussion in subsection IV Al 
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FIG. 1. An experimental realization of a 6-state POVM in a 
photon polarization experiment, consisting of photodetectors 
(D), beam splitters (BS), polarizing beam splitters (PBS), 
and rotators (R). The rotator Ro defines the direction of the 
z-projective measurement, and the angles of rotators in X and 
Y are suitably chosen. 



where the lower bound of the maximal eigenvalue is 
achieved at the points satisfying |si| = [ S2 ] = \sd\ = 

Eq. ([2l))) indicates that the rate of decrease of the risk 
function depends only on the radius r = ||s|| of the Bloch 
vector and is independent of the angles 8 and 0. On the 
other hand, Eq. (j2"T|) indicates that the rate of decrease of 
the error probability depends on all parameters r, 6, <p 
(FiglH(a-l), (a-2), (b-1), (b-2)). 

Next, we choose a squared fidelity distance 
A F (s, s') 2 := 1 — f(s,s') 2 as the loss function, 
where f(s,s') is the fidelity between p(s) and p(s'). In 
1-qubit case, the square of the fidelity is written as [3!| 

1 



f(s, s') 2 = -(l + s-s' + ^(l-\\s\\i)(l-\\sT)-m 

We can calculate the Hesse matrix of A F and the root 
square from Eq.(f30"l) as 



(31) 




V2 l Vi -H 2 



1} ^P } - (32) 



From Eq.(|25J) and Eq.(|3T]), we obtain 
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tr[G*l = - 



2 l-!|s|| 2 



{(sis 2 ) 2 + (s 2 s 3 ) 2 + ( S3Sl ) 2 }.(33) 



and 



We choose a Bloch parametrization of the unknown 
state p(s) = | (7 + s ■ &). Then the inverse of the Fisher 
matrix is found to be 



1 



1*1 



F7 1 = 3 








1 




(25) 



As the first example, we choose the square of the Hilbcrt- 
Schmidt distance A HS (s,s') 2 := Tr[(p(s) - p(s')) 2 } and 
the square of the trace distance A T (s, s') 2 := jTr[\p(s) — 
p(s')\] 2 as the loss functions. Then we obtain that 
A HS (s,s') 2 = A T (s,s') 2 = i||s-s'|| 2 . The Hesse matrix 
H^ s (= Hj) is and the modified information matrix 
is = CT 



\Fj x . We obtain 



tr[Gf]=tr[Gl] = ^(3-|| S || 2 ), 



HS\ 



^i(Gj) 

^(l-min{( Sl ) 2 ,( S2 ) 2 ,( S3 ) 2 }). 



We can readily see that 

3 < trfG? s l = trfGTl 



9 

^2' 



3 ll s lh ^ „ /r<HSN „ (r^\ s ^ 
2 2" < cri(G s ) = (Ji(G s )<-, 



(26) 
(27) 

(28) 
(29) 



(34) 



Eg. ([3^)1 indicates that the rate of decrease of the risk 
function for the fidelity distance depends on all parame- 
ters r, 0, <j), with plots given in Figj2](c-1), (c-2). The 
calculation of the largest eigenvalue <ti(G f ) is done nu- 
merically, with results plotted in Figl2](d-I), (d-2). The 
figures in Fig ^indicate that the rates of decrease of risk 
function and error probability change dramatically with 
the choice of the loss function. 



E. Extension to more general quantum estimation 
problem 

A loss function used in quantum state tomography 
is usually a distance on S (or 5(H)). This is because 
the purpose of quantum state tomography is to identify 
the true parameter (or true density operator). There 
are, however, cases where exact idcntifiability is not re- 
quired, for example, estimations of the average value of 
an Hcrmitian operator, the purity of an unknown state, 
or the value of an entanglement measure. These exam- 
ples correspond to the case where g is a map from S to 
R. More generally, we can consider g : S — > R' , I < 
k = d 2 — 1. Theorem Q] can be generalized to this case 
by modifying the idcntifiability condition (see Appendix) 
and changing the meaning of the superscript —1 from 
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(a-1) (a-2) (b-1) (b-2) 




(c-1) (c-2) (d-1) (d-2) 

FIG. 2. The dependency of the rates of decrease of risk function and error probability at ||s|| = 0.7 against 9 and (f>: tr[G s ] for 
the Hilbert-Schmidt distance (a-1) and (a-2) corresponding to Eq, (|26|l . and for the fidelity distance (c-1) and (c-2) corresponding 
to Eq. (|33l) . (Ti(G s ) for the Hilbert-Schmidt distance (b-1) and (b-2) corresponding to Eq. (|27[) . and for the fidelity distance 
(d-1) and (d-2) corresponding to Eq. (|34[) . These figures show that the bounds for risk function and error probability depend 
on the choice of the loss function. 



the inverse matrix to the Moore-Penrose generalized in- 
verse. Specifically, when I = 1 and the loss function 
A is the squared absolute value, i.e., g : S — > R and 
A(s, s') = \g(s) — g(s')\ 2 , we can obtain 



H s = 2(V s <?)(V s5 ) T , (35) 



o-i (G a ) 2V s g-Fs 1 V s g' 



This result exactly coincides with the known result of 
Eq.(H31). 

When the parameter space is 1-dimcnsional, the rates 
of decrease of the two evaluation methods are character- 
ized by the same function, but when the parameter space 
is more than 2-dimcnsional, the rates can be character- 
ized differently. The most simple tomographic object, 
a 1-qubit state, has a 3-dimensional parameter space, 
therefore even in the simplest type of quantum tomogra- 
phy, if two given testers have the same rate of decrease 
of a risk function, their rates of decrease of error proba- 
bility can be different, i.e., the testers can have different 
quantum tomographic performance (see subsection I V Al) . 



V. DISCUSSION 

A. Evaluating tester performance 

Our result shows that when the true parameter is s, 
the rate of decrease of the error probability is character- 
ized by <ji(G s ). In real experiments of course, we do not 
know the true parameter, which is the reason we per- 
form tomography in the first place. We explain three 
approaches to evaluating tester performance below. 

The first is to use a parameter which we expect as the 
true parameter. In many experiments, quantum state to- 
mography is performed not for estimating a state but for 
proving an experimental realization of a specific quantum 
state, for example, a maximally entangled state. By us- 
ing the parameter corresponding to the quantum state we 
want to realize, we can evaluate the tester's performance 
in achieving that state. Of course the disadvantage of 
this method is that this evaluation result can be differ- 
ent from the true performance in the experiment, because 
the true parameter can be different from the parameter 
which we expect. 

The second is to consider the average performance. Let 
fi denote a measure on the parameter space S. We de- 
fine the average performance of the error probability with 
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respect to a measure [i as 

/ dii{s)a x {G B ). (37) 
Js 

In this approach, a tester with smaller average rate of de- 
crease is better. The average performance can be calcu- 
lated without knowing the true parameter, but of course 
it is not guaranteed that the average value is equivalent to 
the true performance in the experiment. Since this eval- 
uation results depend upon the choice of the measure /z, 
we need to ascertain the validity of the choice. 

The third is to consider the worst case performance. 
We define the worst case performance of a tester as 

max(7i(G s ). (38) 

This can be calculated without the true parameter, and 
it is guaranteed that the true performance is necessar- 
ily better or equal to the value. The disadvantage of 
this method is that we might evaluate the tester's per- 
formance much lower than the true performance in the 
experiment. 

As an example we compare the performance of testers 
according to the first approach. Let us consider a 6-state 
POVM explained in IWlH and FigfU Suppose that the 
density operator which we try to realize is characterized 
by (r = 0.7, 9 — 0, = 0). The rates of decrease of risk 
function and error probability for A HS , A T , A F are char- 
acterized by tr[G s ] and <7i(G s ), given in Eqs. (|26|) . ([27]) . 
(pJ3")) . and Fig 12J Suppose we tune the angles 9o and 
4>o of the rotator Ro in FigfTJ Then the true Bloch vector 
is rotated to (r = 0.7, 9 = 9 , <p = 4>o)- Which angles 9$ 
and 4>o should we choose for the state tomography? The 
true density operator may not be what we want, but it is 
expected to that we want because we make effort to real- 
ize the state in the experiment. So, it is natural to tune 
the angle Qq and (fio so that the statistical error becomes 
as small as possible at the rotated objective density op- 
erator (r = 0.7, 9 — 9o, <p = <po). 

If we use the square of the Hilbert-Schmidt distance as 
the loss function, the rate of decrease of the risk func- 
tion is independent of the angle of the rotator (Fig|2] 
(a-1), (a-2)). Experimental setups with any angle of 
Ro have equivalent performance from the risk function 
viewpoint. On the other hand, the rate of decrease of 
the error probability depends on those angles (Fig|2] (b- 
1), (b-2)). We should tune the angle to the point where 
(r = 0.7, 9 = 9q, 4> = 4>q) is at one of the minima in 
Fig [2] (b-2). Our error probability approach therefore al- 
lows us to evaluate the statistical performance of these 
testers (experiments with varying the angles of Ro) while 
a risk function approach would not. If we use the fidelity 
distance, the minima of the risk function and the error 
probability are the same, (although the curves are not, 
as the figures show), and we should choose the angle such 
that (r = 0.7, 9 = 9q, <fi = </>o) is at one of the minima in 
FigJ5](c-2) and (d-2). This illustrates that the difference 
between the approaches hinges upon the choice of loss 
function we use in our analysis. 



B. Extension to infinite sample space 

Theorem [T] holds for a finite sample space. For a spe- 
cific case (g : S — >• R and A is the squared absolute 
value), it is known that Eq. ([TTj) also holds for infinite 
sample space under some regularity conditions fiol . [llj . 
We can prove that Theorem [T] holds for infinite sam- 
ple space under some conditions by combining the proof 
in fiol [iH with Sanov's theorem and using the linear 
parametrizability of probability distributions in quantum 
mechanics. Therefore, Theorem [T] holds not only for fi- 
nite, but also infinite sample spaces. However, any real 
experiments will have finite detector resolution, and so 
finite sample spaces suffice. 



C. Effect of parameter space boundary 

In Theorem[TJ the true parameter is limited to the inte- 
rior S°. Hence it cannot be applied to parameters on the 
boundary dS := S\S° which corresponds to the set of all 
non full rank density operators, including all pure states. 
This limitation can be overlooked by invoking decoher- 
ence: in real experiments the system of interest is uncon- 
trollably affected by the enviroment, leading to full rank 
states parametrized in the interior. The reason behind 
the limitation is very technical, stemming from the fact 
that in our proof we assume the invariance of the support 
of probability distribution, diffcrentibility, and openness 
at each point of the parameter space. Such regularity 
conditions are assumed in standard classical statistical 
estimation theory. Statistical models that do not satisfy 
the regularity conditions are called non-regular, and it is 
known that they can behave very differently from regular 
statistical models [4(| • The analysis of risk functions and 
error probabilities at dS is an open problem. 



D. Relation to quantum Fisher matrix 

There is an approach to statistical estimation in quan- 
tum systems using a quantity called the quantum Fisher 
matrix. In this subsection, we briefly explain the rela- 
tionship between quantum and classical Fisher matrix 
approaches. 

The quantum Fisher matrix approach is an attempt to 
derive the maximal value of the information extractable 
from a quantum system. The quantum Fisher matrix is 
defined as the matrix satisfying 

F? > F a (U), (39) 

for all POVMs II and s £ S, where F a (IL) is the usual 
Fisher matrix, as well as a monotonicity condition under 
quantum operations [4lM43l | . We put (II) in order to 
emphasize the dependency on the POVM. By combining 
Eq. ([2"Tj) and Eq. (f3"9")) . we can obtain the quantum Cramer- 
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Rao inequality, 



m> *[H.F?-i] ± 



2N 



(40) 



By definition, the quantum Fisher matrix depends only 
on the true density operator and is independent of 
POVMs. So from the risk function viewpoint, the quan- 
tum Fisher matrix can be interpreted as the principal 
bound of the rate of decrease for a fixed true density op- 
erator. By combining our result, Eq. (fT5)) and Eq. ([M| . 
we can obtain 

lim lim -^ TT logP s (JV) (A(s^ t ,s) > e 2 ) 

> -\/a 1 {y/H~ s F?- 1 y/H' a ). (41) 



In general, however, there are no POVMs achieving the 
equality in Eq. (|39|) . except for specific cases which in- 
clude one dimensional parameter space [43j]. So the 
bound is not tight in general multi-parameter estimation, 
like quantum tomography. We use the classical Fisher 
matrix here because we are interested in evaluating the 
performance of a fixed experimental apparatus (tester), 
and we therefore require POVM dependence. One could 
evaluate the performance of a POVM by comparing the 
value of aiiyfHlF^y/Hl) with cr^y/H^F^y/lT^), but 
the compared bound is not achievable in general. The 
derivation of the optimal POVM is an open problem. 



VI. SUMMARY 

In this paper, we proved a large deviation inequality for 
consistent estimators in quantum tomography by using 
classical statistical estimation techniques. The inequality 
shows that, under some conditions, the error probability 
of any consistent estimator can decrease at most expo- 
nentially with respect to the total number of measure- 
ment trials, and there is a bound of the rate of decrease 
which is achievable by a maximum likelihood estimator 
under the informational completeness of the tester. We 
also derived the explicit form of the bound and proved 
that known quantum tomography conditions are equiva- 
lent to the identifiability condition in classical estimation 
theory. 

From our results, it is shown that a risk function and 
error probability measured by the same loss function are 
characterized by a common matrix, the inverse Fisher 
matrix modified by the loss function. The rate of de- 
crease (with respect to the number of trials) of the risk 
function is characterized by the sum of the eigenvalues 
of this matrix, and that of the error probability by the 
maximal eigenvalue. The Cramer-Rao inequality, which 
is a known risk function inequality, holds only for unbi- 
ased estimators, and the bound can be broken by biased 
estimators. On the other hand, the error probability in- 
equality holds for any consistent estimator which gives us 
the true object in the limit of infinite trials. Therefore, 



the lower bound of the error probability characterizes the 
performance of the given apparatus, independently of the 
choice of estimator. The explicit form of the bound makes 
it possible to quantify the performance of the apparatus 
for the estimation purpose in the error probability sense. 
We showed, by using a 6-state POVM in single qubit 
state tomography as an example, that by combining our 
error probability approach with a risk function approach, 
we can evaluate the performance more discerningly than 
we can by considering only risk functions. 



Appendix: Proof of main theorem 

We give the detailed proof of Theorem 1, using classical 
statistical estimation theory. We divide the proof into six 
parts in order to clarify the role of each condition, as well 
as to isolate the role of quantum mechanics in the main 
result. 



1. Six lemmas 

We first consider the setup described in section IIII1 
that is, we do not assume the statistical model given by 
Eq.([T]). Suppose that the parameter space O is a closed 
compact subset of R . Let 90 denote the boundary of 
O, that is, 30 := O \ 0° and assume that 0° is open 
and nonempty. We also assume that pe(x) is a thrice 
differentiate function with respect to 9 G O for any x € 
Q. Note that these assumptions are satisfied in quantum 
mechanics for finite dimensional systems. 

First, we prove that Eq. pTj) holds for any estima- 
tor consistent not only in distances, but also in pseudo- 
distances. 

Lemma 1 Suppose that A is a pseudo- distance on O. // 
9 <E Q° , for an arbitrary consistent estimator 9 est in A, 
the following inequality holds: 



1 



!im -\ogP^ N \A(9f,e)>^) 

N->oo ^ 

>-M {K(p ei \\p e );A(9\e) > e 2 } (A.l) 

Proof: This is a straightforward generalizations of the 
proof in Q , so we omit it here. □ 
From Lemma 1, we obtain 

lim lim log P ( e N) (A(6<§\ 9) > e 2 ) 

c-s-OTV-s-oo e -< v 

>-m\ M {K(p e .\\p e y,^,e) > e 2 }. (A.2) 

Second, we introduce a lemma for calculating the 
R.H.S. of Eq.{OJ. 
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Lemma 2 Let A and B be k x k real, positive- 
semidefinite matrices. If suppA "D suppi? holds, then 



1 



mf { ^ } = _ 

a^kerB CL ■ Bo, a 1 (^/BA- 1 VB 



(A.3) 



fto/ds where A 1 is the Moore-Penrose generalized inverse 
of A. 

Proof: Let us define b := \/Aa/\\VAa\\. Then, 



ml { } 

inf 



zi — (A.4) 

bgkraVS 'bVa l ;||6||=i 6- aM b 

= l/cniVA^BVA' 1 ). (A.5) 



Let us consider the singular value decomposition of 

\f~A 1 \f~B~, i.e., \[A 1 \f~B = U\kU<x, where V\ and U 2 
are k x k unitary matrices and A is a diagonalized ma- 
trix. We obtain 



A 'BVA = (VA y/B)(VA Vb) t (A.6) 
= U-lA 2 !!?, (A.7) 

VBA^VB = {VT 1 Vb) t (VA~ 1 Vb) (A.8) 
= UlA 2 U 2 - (A.9) 

Therefore cr^v/l^Wl" 1 ) = ai (y/BA^y/B). □ 
Note that when A is full rank, the Moore-Penrose gener- 
alized inverse coincides with the (usual) inverse. 

Third, we calculate the infimum on the R.H.S. of 
Eq.dAH). 



Lemma 3 Suppose that A is a sufficient smooth pseudo- 
distance with a non-zero same point Hesse matrix Hg. 
Then 

l^\iri{K(po,\\p 9 y,A(6',6)>e 2 } 



, , 1 ; ; ■ (A.10) 



holds. 



Proof: Let us define B( 



V a\ ~ o A(g'.e) 



Then 



B(Q> 0)- {9 '- e) H ( ^'" 

Bi9 ' e) -m=m' He w=. 



O(||0'-0||),(A.ll) 



and the first term is independent of \\9' — 9\\. Then, for 



sufficiently small e, 
1 



mf{K(p e ,\\ Pe );A(6',6)>e 2 } 



'ee 



inf {K(p g ,\\ Pe );\\e' 



B( 



■}(A.12) 



JL inf {he' - 9)Fg(9' -9) + 0(\\9' - 9\\ 3 ); 



Bt 



a- F g a 

mf i — ft— ;ll a ll 

a^kcrfie O, ■ tlgOi 
1 

o^JhTgF^JlTey 



= 1} 



(A.13) 

(A.14) 
(A.15) 



where we used Lemma 2 in the last line. Note that 
Eq. (|A.10p holds not only for the linit superior lim e _>.o, 
but also for the limit inferior lim ; _^ - O 

From Lemma 1 and Lemma 3, we obtain the following 
inequality for any estimator consistent in a sufficiently 
smooth pseudo-distance with the Hesse matrix Hg: 



Um hm ^logP e W (A(^\0)>e 2 ) 



> -- 



1 



O-AVHgFg 



(A.16) 



Fourth, we prove that if the identifiability condition is 
satisfied, then a maximum likelihood estimator is con- 
sistent in the pseudo-distance A. In preparation, we 
introduce empirical measures. Given a finite sequence 
x N = {xi, . . . ,xjv} and Y G S3, the empirical measure 

N 

induced by the sequence is defined as 

1 N 

L% N (Y):=J2jrJ2 S y^ ( A - 17 ) 

where 6 y x is Kronecker's delta. Then the value of the 
empirical measure on an elemental set {a;} € SB is equiv- 
alent to the relative frequency of x for the data x N , i.e., 
/jv(x) = {{x}). We identify and f N below. 

Now we introduce Sanov's theorem for empirical mea- 
sures. Let P p denote a probability measure on SB with 
a probability distribution p. When p G V@, we have 

P P = Pq- We use a notation Pp(L^ N G A) := 
p( N \{x N G n N ;L% N G .A}), where .4 is a given set 
of probability distributions. 

Theorem (Sanov) For every set A of probability dis- 
tributions in P(f2), 

inf K(p'\\p)< Hm 1 lo g pW(L^ G A) (A.18) 



N- 



< lim llogpW(L^e^)(A.19) 

iV— »oo iV 

< - inf A-(p'||p), (A.20) 

p's.4 
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where A° is the interior of A considered as a subset of 
■p(fi) and K is the Kullback-Leibler divergence fill . ["/H /. 

We are now in a position to prove the following lemma. 



Lemma 4 // the identiflability condition is satisfied, 
then 



N 



lim P e (JV) (A«^) 



> e 







(A.21) 



holds for any e > 0. That is, a maximum likelihood esti- 
mator is consistent in a pseudo- distance A on 0. 



Proof: A maximum likelihood estimate 



'N 



can be re- 



defined by using the Kullback-Leibler divergence and the 
relative frequency as follows: 



= argmax eee n i= i Pe{xi) 
= argminegeA^/Arllpe). 

9 P := argmin eee A(p||p e ). 
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Let us define 



(A.22) 
(A.23) 

(A.24) 



Then = 9f N . When analyzing a maximum likelihood 
estimate 0™ , we need to be careful to check whether 6*™ 1 
is included in 9° or 90. Let us introduce four sets of 
probability distributions A, A, A3, and Vg yt as 

A: = -freP ;0 p G0 o }, (A.25) 

A 2 : = {pe Ve;0 P e 90}, (A.26) 

A 3 : = V(Q)\V e , (A.27) 

V e , e : = { P e P(n); A(6 p , 6) > e 2 }. (A.28) 

If f N e A! u A(= Vs), then Pg g = f N . If f N e A3, 

then p e mi G A and p^mi ^ /at. Since P(O) ^iiU^U 
A and these sets are disjoint, we can rewrite the error 
probability as 



AT , 1 



>e 2 ) 



(A.29) 



(A.30) 



Because is compact and 0° is not empty, from 
Sanov's theorem, we can obtain 



1 



lim -logPf'^e^n^) 

= - inf «T(p||pfl), j = 1,2,3. (A.31) 



From the identiflability condition, 

inf K{p\\p e ) > 0, j 



1,2,3. 



(A.32) 



Therefore, for sufficiently large A, there exists v, < 
v <1, such that 



pW(A(0£\0)>e 2 )<^ 



iV 



(A.33) 



holds for any e > 0. So, a maximum likelihood estimator 
is consistent in A under the identiflability condition. □ 
Fifth, we prove that if the identiflability condition is 
satisfied, a maximum likelihood estimator achieves the 
equality in Eq. (|A.16[) . 

Lemma 5 Suppose that A is a sufficiently smooth 
pseudo- distance on with a non-zero same point Hesse 
matrix Hg. If the identiflability condition is satisfied, 
then 

lim lim J_logF e (Ar) (A(6'^ 1 ,6') > e 2 ) 



>ojv-s.oo e 2 N 



1 



(A.34) 



holds. 



Proof: From the continuity of K and the openness of 
S°, for arbitrary 8 G 0°, there exists eo > such that 

inf K(p\\p g )< inf K(p\\pg), (A.35) 

hold for j = 2, 3 and for any e satisfying < e < e pij ]. 
Hence, for sufficiently large A and sufficiently small e, 

P g [N \fN € Ax n Vg, e ) > P^ N \f N € Aj H Z> e , e ), (A.36) 
hold for j = 2,3, and we have 



„logP e (AO (A(0 



lim lim -= — 

c->o iv-voo e^A 

Hm lim 4- log [P (A ° (/at g A n 2? e 

+Pg [N \f N e i 2 n%) 

HE lim J_ logF e (Ar) (/jv G A H 23 9 , e ) (A.38) 

e-s-0 2V->oo e z A 



(A.37) 



= lim 



inf 



-^(Plbe)] 



(A.39) 



-lim -5 inf {A(^|be);A(0',0) >e 2 }(A.40) 

(A.41) 



where we used Lemma 3 in the last line. Because a 
maximum likelihood estimator satisfies both Eqs. (|A.16l) 
and (|A.41[) . it achieves the equality in Eq. (|A.16[) . and 
Eq. (|A~34)) holds. □ 

The final lemma relates the identiflability condition 
in classical statistical estimation theory to informational 
completeness in quantum tomography. Wc assume now 
that the probability distributions arc given by quantum 
mechanics, Eq.([T]), for finite dimensional systems. 

Lemma 6 Let p = p(s) denote a density operator 
parametrized by a vector s G S. We assume that the 
parametrization is one-to-one. Suppose that we perform 
quantum state tomography with a POVMTl = {Tl x } x eQ- 
Then the following statements are equivalent. 
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1. The probability distribution describing the tomo- 
graphic experiment satisfies the identifiability con- 
dition. 

2. The Fisher matrix F s is full rank for any seS°. 

3. The POVM is informationally complete. 

Proof: First we show that it is sufficient to prove the 
equivalence of the three conditions in Lemma [5] for a lin- 
ear paramctrization. In quantum mechanics, for a finite 
dimensional system, any probability distribution is lin- 
early one-to-one paramctrizable, and we can assume that 
the probability distribution has the form 



p a (x) = v(x) + s ■ w(x), 



(A.42) 



where v(x) £ R and w(x) £ R d " _1 satisfy J2xesiPa( x ) = 
1 for any s £ S. If the probability distribution is one- 
to-one (but not necessarily linearly) parametrized by a 
different parameter t £ R d _1 , then we have 



Pt(x) =p a (t)(x), 



V t pt(x) = -7^VsPs{ 



(A.43) 
(A.44) 



Condition Q] for s and condition [T] for t are equivalent 
because both parametrizations are one-to-one. Condition 
[2] for s and condition [2] for t are equivalent because the 
Fisher matrices satisfy the equation 



Ft 



ds ds T 
di s di ' 



(A.45) 



and the Jacobian ^| is full rank. Condition [3] is indepen- 
dent of state paramctrization. Therefore if condition [TJ 
[21 and [3] are equivalent for a linear paramctrization, then 
they are also equivalent for a general parametrization. 

Next we prove the equivalence of conditions [1] and 
As in the above discussion, without loss of generality, we 
can assume that s is the fixed parameter such that F s is 
diagonalizcd because this is a linear transformation of a 
general parameter. Under this assumption, condition Q] 
is equivalent to the condition that for any s £ S° and for 
all a — 1 , . . . , d 2 — 1 , there exists at least one x £ f2 such 
that 



d a p s {x) ^ 0, 



(A.46) 



where d a := On the other hand, the diagonal ele- 

ments of the Fisher matrix are 

(d a Ps(x)) 2 



F = V 



Ps{x) 



l,...,d 2 - 1.(A.47) 



Therefore the full rankness of the Fisher matrix is equiv- 
alent to Eq. (|A.46p . and condition [T] and condition [2] are 
equivalent. 

Third we prove the equivalence between condition [2] 
and [3] We choose the generalized Bloch parametrization 



of density operators [27|, |28[ ; any density operator p can 
be represented as 



s. If 1 

P{S ) = -I+- S . (T> 



(A.48) 



where I is the identity operator on % and & a are gen- 
erators of SU(d) satisfying a a = cr^, Tr[er Q ] = 0, and 
Tr [criers] = 2o~ a ^ (a, f3 = 1, . . . , d 2 — 1). To determine the 
representation uniquely, we need more additional condi- 
tions on <t, but the additional conditions are not used 
in the following discussion. Each element of the tester 
POVM can be represented as 

ti x = v(x)i + w(x) ■ &, x £ fi, (A. 49) 

where v(x) and w(x) should satisfy J2 x( zqv(x) = I, 

^2xen w ( x ) = 0' an( l n x > for any x £ ft. Then, 
the probability distribution describing the tomographic 
experiment is represented as Eq. ([A.42[) . and the Fisher 
matrix is 



E 
E 



p s {x)\7 s \ogp s (x)\7 s logp s (x) T (A. 50) 
w(x)w(x) T 



Ps(x) 



(A.51) 



Therefore the full rankness of the Fisher matrix is equiv- 
alent to the condition that {w(x)} x ^n spans R d _1 , and 
this implies that the tester POVM II is informationally 
complete. □ 



2. A more general theorem 



From Eq. ([A.16p and Eq. (|A.34p . we obtain the following 
theorem. 

Theorem 2 Suppose that A on O is a sufficiently 
smooth pseudo-distance with a non-zero same point Hesse 
matrix Hg. If £ 0°, for an arbitrary consistent esti- 
mator 9 cst , the following inequality holds: 



1 



M liffi -^-logP^A^ 



N i 1 



>e 2 ) 



'Ho 



(A.52) 

Furthermore, when the identifiability condition is satis- 
fied, a maximum likelihood estimator 9 ml is consistent 
and achieves the equality in Eg. \ A. 5 i.e.. 



lim lim 



1 



>o at->oo e 2 N 



logP e W (A(^\fl) 



-1/a^HeF, 



>e 2 ) 
~H~e) (A.53) 



holds. 



Theorem 2 is in fact more general than Theorem 1, since 
identifiability is more general than informational com- 
pleteness. Hence, the properties that the error probabil- 
ities of consistent estimators can decrease at most expo- 
nentially, the rate of decrease is bounded by the maximal 
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eigenvalue of a matrix, and the bound is achievable by 
a maximum-likelihood estimator are common to a larger 
class of probability distributions than those of quantum 
mechanics. 

By applying Theorem 2 to quantum state tomography 
and using Lemma 6, we can obtain Theorem 1. Theorem 
2 is applicable to the other types of quantum tomogra- 
phy. The conditions corresponding to the identifiability 
condition are different, and can be derived in the same 
way as in the proof of Lemma 6. For example, let us 
consider ancilla-unassisted quantum process tomography. 
To identify an unknown quantum process described by a 
linear, completely-positive, and trace-preserving map k 
on S(H), we prepare a set of input states p = {pn} n =i 
where p n G S(H) and a measurement described by a 
POVM n = {fl x } xe n on U. The set {p, 11} is the tester 
for ancilla-unassisted process tomography. When p spans 
S(H), it is called tomographically complete. In ancilla- 
unassisted process tomography, the informational com- 



pleteness of II and the tomographical completeness of p 
both are required. We can prove that these conditions 
arc equivalent to the identifiability condition in the same 
way as in lemma |51 

For the case where the same point Hesse matrix of the 
loss function is positive semidefinite, as mentioned in sub- 
section IIVE1 the identifiability condition is modified as 
follows: for any 6 G 6° and 6' G 9, if g{6) ^ g(6'), then 
there exists at least one single outcome x € Q satisfying 
pg(x) ^ p8'(x). Theorem 2 holds for this modification. 
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